Multiversioning in the Store Queue Is the Root of All Store-forwarding Evil
نویسنده
چکیده
As semiconductor technologies have continued to scale according to Moore’s Law, complexity, power consumption, and energy dissipation have become first-order considerations in microprocessor design. In processors that issue instructions out-of-order, store-load forwarding is a source of significant complexity and energy dissipation. To decrease the complexity and improve the energy efficiency of store-load forwarding, this thesis proposes the forwarding cache (FC), an address-indexed, set-associative alternative to the age-indexed, fully associative store queue (SQ). The SQ is a content-addressable memory (CAM) that holds in-flight stores in program order. Because the SQ is age-indexed, a load’s address may match one or more stores located anywhere in the SQ. Thus, the SQ search is fully associative and priority-encoded. In today’s wide-issue processors, the SQ is large (24 to 32 entries), multiported (to accommodate the issue of multiple loads in a single cycle), and fast (no slower than an L1 data cache hit). The energy and complexity required to perform a fast search in a highly associative, multiported CAM are substantial. The contributions of this work are as follows. First, this thesis shows empirically that address multiversioning and the accompanying age-ordered, priority-encoded search are rarely necessary to perform store-load forwarding correctly. While others have observed the same empirical result on a particular processor configuration using a particular load speculation policy, this thesis extends the analysis to a broad variety of processors using several load speculation policies. Second, this thesis proposes the forwarding cache (FC), an address-indexed, set-associative cache that performs store-load forwarding. Third, this thesis investigates the sensitivity of the FC’s performance and energy dissipation to several design parameters, including size, associativity, number of banks, and number of ports. The results show that a small, simple, set-associative FC performs comparably to the complex, fully associative SQ on both control-intensive and scientific workloads, while dissipating nearly ten times less energy than the SQ.
منابع مشابه
Scaling Load-Store Queue
In order to tolerate long latency instructions, large load and store queue is necessary to bypass in flight information to dependent instruction; but as the latency goes up, the size of the load and store queue will increase as well, which will impact cycle time, area and power. Hierarchical designs in [2] and [10] was proposed to alleviate cycle time problem, but the CAM and search functions r...
متن کاملPresenting a Model for Antecedents and Consequences of Customer in-Store Experience
The aim of this study is to investigate in-store antecedents and consequences experiences on Hyperstar Stores. There were investigated purchase intention, store environment and characteristics of employees as antecedents affecting in-store experiences; while diversion purchasing and customer satisfaction were considered as its consequences. There were designed two studies to test hypotheses. Th...
متن کاملA High-Bandwidth Load-Store Unit for Single- and Multi-Threaded Processors
Abstract A store queue (SQ) is a critical component of the load execution machinery. High ILP processors require high load execution bandwidth, but providing high bandwidth SQ access is difficult. Address banking, which works well for caches, conflicts with age-ordering which is required for the SQ and multi-porting exacerbates the latency of the associative searches that load execution require...
متن کاملCustomer lifetime value model in an online toy store
Business all around the world uses different approaches to know their customers, segment them and formulate suitable strategies for them. One of these approaches is calculating the value of each customer for the company. In this paper by calculating Customer Lifetime Value (CLV) for individual customers of an online toy store named Alakdolak, three customer segments are extracted. The level of ...
متن کاملSimulation of Store Separation using Low-cost CFD with Dynamic Meshing
The simulation of the store separation using the automatic coupling of dynamic equations with flow aerodynamics is addressed. The precision and cost (calculation time) were considered as comparators. The method used in the present research decreased the calculation cost while limiting the solution error within a specific and tolerable interval. The methods applied to model the aerodynamic force...
متن کامل